Onomastics 2.0 - The Power of Social Co-Occurrences

نویسندگان

  • Folke Mitzlaff
  • Gerd Stumme
چکیده

Abstract. Onomastics is “the science or study of the origin and forms of proper names of persons or places.” Especially personal names play an important role in daily life, as all over the world future parents are facing the task of finding a suitable given name for their child. This choice is influenced by different factors, such as the social context, language, cultural background and, in particular, personal taste. With the rise of the Social Web and its applications, users more and more interact digitally and participate in the creation of heterogeneous, distributed, collaborative data collections. These sources of data also reflect current and new naming trends as well as new emerging interrelations among names. The present work shows, how basic approaches from the field of social network analysis and information retrieval can be applied for discovering relations among names, thus extending Onomastics by data mining techniques. The considered approach starts with building co-occurrence graphs relative to data from the Social Web, respectively for given names and city names. As a main result, correlations between semantically grounded similarities among names (e. g., geographical distance for city names) and structural graph based similarities are observed. The discovered relations among given names are the foundation of the Nameling, a search engine and academic research platform for given names which attracted more than 30,000 users within four months, underpinning the relevance of the proposed methodology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Using jWebMiner 2.0 to Improve Music Classification Performance by Combining Different Types of Features Mined from the Web

This paper presents the jWebMiner 2.0 cultural feature extraction software and describes the results of several musical genre classification experiments performed with it. jWebMiner 2.0 is an easy-to-use and open-source tool that allows users to mine the Internet in order to extract features based on both Last.fm social tags and general web search string co-occurrences extracted using the Yahoo...

متن کامل

Image Steganalysis Based on Co-Occurrences of Integer Wavelet Coefficients

We present a steganalysis scheme for LSB matching steganography based on feature vectors extracted from integer wavelet transform (IWT). In integer wavelet decomposition of an image, the coefficients will be integer, so we can calculate co-occurrence matrix of them without rounding the coefficients. Before calculation of co-occurrence matrices, we clip some of the most significant bitplanes of ...

متن کامل

تحلیل شبکه و کنشگران کلیدی در راستای مدیریت حیات وحش (منطقه مورد مطالعه: زیستگاه سیاه خروس قفقازی- ذخیرگاه زیستکره ارسباران)

One of the most important approaches for policy making in order to biodiversity conservation and wildlife management is co-management of natural resources. Local stakeholders are one of the main elements in this approach. This is necessary to consider social network analysis in the framework of social-ecological systems toward biodiversity conservation and sustainable wildlife management. In th...

متن کامل

Rule Discovery and Probabilistic Modeling for Onomastic Data

The naming of natural features, such as hills, lakes, springs, meadows etc., provides awealth of linguistic information; the study of the names and naming systems is called onomastics. We consider a data set containing all names and locations of about 58,000 lakes in Finland. Using computational techniques, we address two major onomastic themes. First, we address the existence of local dependen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1303.0484  شماره 

صفحات  -

تاریخ انتشار 2013